NVIDIA-554: DPU-host mode: use ConfigMap for OVN feature enablement instead of per-node script gating#2944
NVIDIA-554: DPU-host mode: use ConfigMap for OVN feature enablement instead of per-node script gating#2944tsorya wants to merge 3 commits intoopenshift:masterfrom
Conversation
tsorya
commented
Mar 20, 2026
|
@tsorya: This pull request references NVIDIA-554 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
WalkthroughPer-node OVN_NODE_MODE-based feature gating was removed; feature enablement is now configured cluster-wide via the ovnkube ConfigMap. Startup scripts and control-plane manifests stop constructing/passing per-node feature flags; documentation and tests were updated to match the centralized configuration model. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.11.4)level=error msg="Running error: context loading failed: failed to load packages: failed to load packages: failed to load with go/packages: err: exit status 1: stderr: go: inconsistent vendoring in :\n\tgithub.com/Masterminds/[email protected]: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/Masterminds/sprig/[email protected]: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/containernetworking/[email protected]: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/ghodss/[email protected]: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/go-bindata/[email protected]+incompatible: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/onsi/[email protected]: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/ope ... [truncated 17356 characters] ... ired in go.mod, but not marked as explicit in vendor/modules.txt\n\tk8s.io/gengo/[email protected]: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tk8s.io/[email protected]: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tk8s.io/[email protected]: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tsigs.k8s.io/[email protected]: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tsigs.k8s.io/structured-merge-diff/[email protected]: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\n\tTo ignore the vendor directory, use -mod=readonly or -mod=mod.\n\tTo sync the vendor directory, run:\n\t\tgo mod vendor\n" Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: tsorya The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
bindata/network/ovn-kubernetes/common/008-script-lib.yaml (1)
701-724:⚠️ Potential issue | 🟠 MajorMulticast should be moved to
004-config.yamlper documented design.The documentation states that "Feature enablement (egress IP, multicast, multi-network, network segmentation, admin network policy, etc.) is managed through the cluster-wide ConfigMap (
004-config.yaml)", but--enable-multicastis still hardcoded as a CLI flag at line 723. The004-config.yamltemplates contain dozens ofenable-*settings (e.g.,enable-egress-ip,enable-multi-network,enable-admin-network-policy) butenable-multicastis absent. Either move multicast configuration into the config file to align with the documented design and other features, or update the documentation to clarify why multicast remains CLI-driven.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bindata/network/ovn-kubernetes/common/008-script-lib.yaml` around lines 701 - 724, The script currently hardcodes the CLI flag --enable-multicast in the ovnkube exec line; move multicast feature toggling into the cluster-wide ConfigMap like other features in 004-config.yaml. Add a ConfigMap/template key (e.g., enable-multicast) to 004-config.yaml, expose it as a rendered variable (e.g., multicast_enable_flag) alongside the existing flags (similar to network_observability_enabled_flag), then replace the literal --enable-multicast in the exec invocation with the variable ${multicast_enable_flag} (or remove it when false). Alternatively, if you intend multicast to stay CLI-driven, update the docs to explicitly state why --enable-multicast remains a hardcoded flag instead of being managed via 004-config.yaml.
🧹 Nitpick comments (1)
pkg/network/ovn_kubernetes_test.go (1)
296-306: Add one DPU-host regression case.These updated goldens prove the new
[ovnkubernetesfeature]contents, but they still don't lock in the behavior this PR is changing:dpu-hostshould stop rewriting feature flags while still changing gateway/interface selection,--ovnkube-node-mode, andinit_ovnkube_controller. A small render test aroundovnkube-lib.shwould make that contract much harder to regress.Also applies to: 342-352, 401-412, 463-474, 525-535, 586-596, 636-646, 689-699, 735-745, 782-793, 831-841, 877-887, 925-935, 972-982
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/network/ovn_kubernetes_test.go` around lines 296 - 306, Add a single regression test case in pkg/network/ovn_kubernetes_test.go that renders ovnkube-lib.sh for the dpu-host node mode: feed the test input containing the full [ovnkubernetesfeature] block shown in the diff, set node mode to "dpu-host", and assert that the output preserves the entire [ovnkubernetesfeature] block exactly (no feature flags rewritten) while still updating only the gateway/interface selection, the --ovnkube-node-mode value to "dpu-host", and the init_ovnkube_controller behavior; place the case alongside the existing render tests in the file that exercise ovnkube-lib.sh so it will fail if dpu-host rewrites feature flags.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@bindata/network/ovn-kubernetes/common/008-script-lib.yaml`:
- Around line 701-724: The script currently hardcodes the CLI flag
--enable-multicast in the ovnkube exec line; move multicast feature toggling
into the cluster-wide ConfigMap like other features in 004-config.yaml. Add a
ConfigMap/template key (e.g., enable-multicast) to 004-config.yaml, expose it as
a rendered variable (e.g., multicast_enable_flag) alongside the existing flags
(similar to network_observability_enabled_flag), then replace the literal
--enable-multicast in the exec invocation with the variable
${multicast_enable_flag} (or remove it when false). Alternatively, if you intend
multicast to stay CLI-driven, update the docs to explicitly state why
--enable-multicast remains a hardcoded flag instead of being managed via
004-config.yaml.
---
Nitpick comments:
In `@pkg/network/ovn_kubernetes_test.go`:
- Around line 296-306: Add a single regression test case in
pkg/network/ovn_kubernetes_test.go that renders ovnkube-lib.sh for the dpu-host
node mode: feed the test input containing the full [ovnkubernetesfeature] block
shown in the diff, set node mode to "dpu-host", and assert that the output
preserves the entire [ovnkubernetesfeature] block exactly (no feature flags
rewritten) while still updating only the gateway/interface selection, the
--ovnkube-node-mode value to "dpu-host", and the init_ovnkube_controller
behavior; place the case alongside the existing render tests in the file that
exercise ovnkube-lib.sh so it will fail if dpu-host rewrites feature flags.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 5c1994eb-63ed-421c-ab98-8900f240f6ff
📒 Files selected for processing (10)
README.mdbindata/network/ovn-kubernetes/common/008-script-lib.yamlbindata/network/ovn-kubernetes/managed/004-config.yamlbindata/network/ovn-kubernetes/managed/ovnkube-control-plane.yamlbindata/network/ovn-kubernetes/self-hosted/004-config.yamlbindata/network/ovn-kubernetes/self-hosted/ovnkube-control-plane.yamldocs/architecture.mddocs/operands.mddocs/ovn_node_mode.mdpkg/network/ovn_kubernetes_test.go
💤 Files with no reviewable changes (3)
- README.md
- docs/architecture.md
- docs/operands.md
|
/retest-required |
|
/test ? |
|
oc exec -it -n openshift-ovn-kubernetes ovnkube-node-bpkjk -- bash [root@dhcp-8-231-235 ~]# ovn-nbctl find ACL | grep -i mcast match : "(ip4.mcast || mldv1 || mldv2 || (ip6.dst[120..127] == 0xff && ip6.dst[116] == 1))" match : "(ip4.mcast || mldv1 || mldv2 || (ip6.dst[120..127] == 0xff && ip6.dst[116] == 1))" match : "outport == @a15696136357465712812 && (igmp || (ip4.src == $a14513626322132345176 && ip4.mcast))" match : "outport == @a4743249366342378346 && (ip4.mcast || mldv1 || mldv2 || (ip6.dst[120..127] == 0xff && ip6.dst[116] == 1))" match : "inport == @a4743249366342378346 && (ip4.mcast || mldv1 || mldv2 || (ip6.dst[120..127] == 0xff && ip6.dst[116] == 1))" match : "inport == @a15696136357465712812 && ip4.mcast" 10.130.0.3 : unicast, xmt/rcv/%loss = 10/10/0%, min/avg/max/std-dev = 0.290/0.404/1.026/0.222 10.130.0.3 : multicast, xmt/rcv/%loss = 10/9/9% (seq>=2 0%), min/avg/max/std-dev = 0.307/0.522/1.115/0.319 10.128.0.30 : unicast, xmt/rcv/%loss = 10/10/0%, min/avg/max/std-dev = 0.314/0.449/0.717/0.140 10.128.0.30 : multicast, xmt/rcv/%loss = 10/9/9% (seq>=2 0%), min/avg/max/std-dev = 0.285/0.445/0.823/0.155 10.128.0.30 : unicast, xmt/rcv/%loss = 10/10/0%, min/avg/max/std-dev = 0.307/0.488/0.991/0.218 10.128.0.30 : multicast, xmt/rcv/%loss = 10/9/9% (seq>=2 0%), min/avg/max/std-dev = 0.313/0.499/1.212/0.273 10.129.0.7 : unicast, xmt/rcv/%loss = 10/10/0%, min/avg/max/std-dev = 0.307/0.462/1.057/0.252 10.129.0.7 : multicast, xmt/rcv/%loss = 10/9/9% (seq>=2 0%), min/avg/max/std-dev = 0.320/0.501/1.513/0.382 |
|
/retest |
f58272e to
7f928af
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/network/ovn_kubernetes_test.go`:
- Around line 298-308: Update the two failing tests to match the ConfigMap-based
migration: in TestOVNKubernetesControlPlaneFlags and
TestOVNKubernetesScriptLibCombined remove assertions that expect the removed CLI
flags and script-lib variables (the old enable-* CLI flags and corresponding
script-lib vars) and instead assert the presence/contents of the new
ovnkube.conf configmap golden (the block shown at lines 298–308). Also either
implement or remove references to the missing helpers
renderControlPlaneWithOverrides and renderScriptLibWithOverrides used by those
tests—if keeping them, implement helpers that render using the new ConfigMap
migration semantics and return the rendered ovnkube.conf/script-lib output;
otherwise delete calls to them and inline the appropriate rendering/assertion
using existing render helpers. Ensure all references are to the test names
TestOVNKubernetesControlPlaneFlags, TestOVNKubernetesScriptLibCombined and the
helper names renderControlPlaneWithOverrides/renderScriptLibWithOverrides so
reviewers can locate the edits.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: d16edd6a-c511-46cc-ada2-5d68285bf3f3
📒 Files selected for processing (10)
README.mdbindata/network/ovn-kubernetes/common/008-script-lib.yamlbindata/network/ovn-kubernetes/managed/004-config.yamlbindata/network/ovn-kubernetes/managed/ovnkube-control-plane.yamlbindata/network/ovn-kubernetes/self-hosted/004-config.yamlbindata/network/ovn-kubernetes/self-hosted/ovnkube-control-plane.yamldocs/architecture.mddocs/operands.mddocs/ovn_node_mode.mdpkg/network/ovn_kubernetes_test.go
💤 Files with no reviewable changes (3)
- docs/architecture.md
- docs/operands.md
- README.md
✅ Files skipped from review due to trivial changes (3)
- bindata/network/ovn-kubernetes/managed/004-config.yaml
- docs/ovn_node_mode.md
- bindata/network/ovn-kubernetes/common/008-script-lib.yaml
🚧 Files skipped from review as they are similar to previous changes (1)
- bindata/network/ovn-kubernetes/self-hosted/004-config.yaml
…nstead of per-node script gating
Feature flags (egress IP, multicast, multi-network, network
segmentation, admin network policy, multi-external-gateway, etc.)
are managed via the cluster-wide ConfigMap (004-config.yaml) passed
to ovnkube through --config-file. They do not need per-node gating
in the startup script.
OVN_NODE_MODE remains used only for DPU-host structural differences:
gateway interface, ovnkube-node-mode flag, and init-controller.
Also re-applies the feature gate cleanup from f5b8490 (removal of
OVN_ADMIN_NETWORK_POLICY_ENABLE template references) and removes
redundant CLI flags from 008-script-lib.yaml that duplicate what
the ConfigMap already provides.
Co-authored-by: Cursor <[email protected]>
Made-with: Cursor
This reverts commit 6f5697c.
Remove enable-multicast=true from ovnkube config maps and pass it directly as --enable-multicast on the ovnkube CLI for node and control plane processes (both self-hosted and managed). Made-with: Cursor
7f928af to
f93b3fc
Compare
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
pkg/network/ovn_kubernetes_test.go (1)
1020-1024:⚠️ Potential issue | 🟠 MajorTest expectation appears inconsistent with ConfigMap template changes.
The "enable network connect (DevPreview)" test case expected output (lines 1020-1024) is missing the egress feature flags (
enable-egress-ip=true,enable-egress-firewall=true,enable-egress-qos=true,enable-egress-service=true) and other flags (enable-multi-network=true,enable-admin-network-policy=true,enable-multi-external-gateway=true) that are now unconditionally emitted in the ConfigMap template per this PR.All other test cases in this file were updated to include these flags, but this one appears to have been missed. This will likely cause the test to fail.
🐛 Proposed fix to add missing flags
[ovnkubernetesfeature] +enable-egress-ip=true +enable-egress-firewall=true +enable-egress-qos=true +enable-egress-service=true egressip-node-healthcheck-port=9107 +enable-multi-network=true enable-network-segmentation=true enable-preconfigured-udn-addresses=true +enable-admin-network-policy=true +enable-multi-external-gateway=true enable-network-connect=true🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/network/ovn_kubernetes_test.go` around lines 1020 - 1024, The test case "enable network connect (DevPreview)" in ovn_kubernetes_test.go has an outdated expected ConfigMap snippet; update its expected output to include the unconditionally emitted flags now present in the template: add enable-egress-ip=true, enable-egress-firewall=true, enable-egress-qos=true, enable-egress-service=true, enable-multi-network=true, enable-admin-network-policy=true, and enable-multi-external-gateway=true to the expected config string for that test case so it matches the other updated cases.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@pkg/network/ovn_kubernetes_test.go`:
- Around line 1020-1024: The test case "enable network connect (DevPreview)" in
ovn_kubernetes_test.go has an outdated expected ConfigMap snippet; update its
expected output to include the unconditionally emitted flags now present in the
template: add enable-egress-ip=true, enable-egress-firewall=true,
enable-egress-qos=true, enable-egress-service=true, enable-multi-network=true,
enable-admin-network-policy=true, and enable-multi-external-gateway=true to the
expected config string for that test case so it matches the other updated cases.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 37e2203b-5661-4153-8807-475f50c6af4f
📒 Files selected for processing (10)
README.mdbindata/network/ovn-kubernetes/common/008-script-lib.yamlbindata/network/ovn-kubernetes/managed/004-config.yamlbindata/network/ovn-kubernetes/managed/ovnkube-control-plane.yamlbindata/network/ovn-kubernetes/self-hosted/004-config.yamlbindata/network/ovn-kubernetes/self-hosted/ovnkube-control-plane.yamldocs/architecture.mddocs/operands.mddocs/ovn_node_mode.mdpkg/network/ovn_kubernetes_test.go
💤 Files with no reviewable changes (3)
- README.md
- docs/architecture.md
- docs/operands.md
🚧 Files skipped from review as they are similar to previous changes (4)
- bindata/network/ovn-kubernetes/managed/ovnkube-control-plane.yaml
- bindata/network/ovn-kubernetes/self-hosted/004-config.yaml
- bindata/network/ovn-kubernetes/common/008-script-lib.yaml
- docs/ovn_node_mode.md
|
/verified by pre-merge testing. |
|
@yingwang-0320: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Remove TestOVNKubernetesControlPlaneFlags and TestOVNKubernetesScriptLibCombined which reference undefined helpers (renderControlPlaneWithOverrides, renderScriptLibWithOverrides) and assert on flags that were moved to the ConfigMap. Add missing unconditionally-emitted flags to the "enable network connect (DevPreview)" test case expected output. Made-with: Cursor
ed84508 to
526a5aa
Compare
|
/retest-required |
|
/retest-requires |
|
@tsorya hi, was there any QE testing done with this PR applied and automation/CI developed for the DPU-HOST mode? If so, would you mind pasting the testing result and CI links. I'd like to make sure we have a record for the testing on this PR and there is continuous CI job running to gate this functionality since we (core networking) currently don't have the necessary env nor CI to verify the piece of code for DPU functinoaliies. @yingwang-0320 Sicne this change touches the enablement/disablement of several OVN-K features, could you also run testing for each of the mentioned features and see if they are functional as expected? meanwhile, like @tssurya suggested, we would also want to try enable the DPU Host mode (without testing DPU specific functionalities) and scheck if the cluster is stable. |
|
@danwinship Would you mind taking a look at this PR once the testing is done? |
|
@tsorya we don't have any DPU-HOST mode related testing yet. Not sure what environment or card needed to enable DPU-host mode, let me confirm and update here. |
|
Confirmed with @tsorya , it's OK to verify CNO on regular mode. |
|
/retest-required |
|
@tsorya: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
| egress_features_enable_flag="" | ||
|
|
||
| # disable multicast for dpu-host mode as it is not supported | ||
| enable_multicast_flag="" |
There was a problem hiding this comment.
The PR/commit doesn't actually explain why we can get rid of all of this. Do we implement egress IPs and multicast in DPU mode now? Or does DPU mode just ignore the settings of these flags?
| ${export_network_flows_flags} \ | ||
| ${multi_network_enabled_flag} \ | ||
| ${network_segmentation_enabled_flag} \ | ||
| ${network_connect_enabled_flag} \ |
There was a problem hiding this comment.
You're removing ${network_connect_enabled_flag} here but you still set it above.
| {{- if not .OVN_MULTI_NETWORK_ENABLE }} | ||
| enable-multi-network=true | ||
| {{- end }} |
There was a problem hiding this comment.
OK, so the handling of enable-multi-network is totally broken; we enable it whether OVN_MULTI_NETWORK_ENABLE is true or false.
It looks like when the "network segmentation" stuff was added, we made it so DisableMultiNetwork was ignored if the network segmentation feature gate was enabled, and then when that feature gate was removed, we ended up with a bunch of places doing "enable multi-network if OVN_MULTI_NETWORK_ENABLE is true and also enable multi-network if OVN_MULTI_NETWORK_ENABLE is false"...
You should remove all of the DisableMultiNetwork/OVN_MULTI_NETWORK_ENABLE code from the tree. (Ideally, that would be a separate commit before this one.)
| ${multi_network_enabled_flag} \ | ||
| ${network_segmentation_enabled_flag} \ | ||
| ${gateway_mode_flags} \ | ||
| ${network_connect_enabled_flag} \ |
There was a problem hiding this comment.
You removed the setting of network_connect_enabled_flag but you forgot to remove it here.
| {{- end }} | ||
| enable-admin-network-policy=true | ||
| enable-multi-external-gateway=true | ||
| enable-multicast=true |
There was a problem hiding this comment.
(I can't comment on the commit message directly; you should explain there that the issue is that there isn't an enable-multicast config file option. The flag can only be set via the CLI.)
| }) | ||
| } | ||
|
|
||
| func TestOVNKubernetesControlPlaneFlags(t *testing.T) { |
There was a problem hiding this comment.
This whole commit should be squashed into the first commit. It doesn't make sense to commit a fix that breaks the unit tests first, and then have a second commit to unbreak them.